Goto

Collaborating Authors

 correct answer




Only 20% of people can solve this three-question IQ test backed by MIT... are you one of them?

Daily Mail - Science & tech

Rogue Republican calls Trump presidency the'Epstein administration' amid criticism of Pam Bondi Mystery buyer of Epstein's Zorro Ranch is revealed to be Texas politician running for election Scandal engulfs America's'most hated podcast': Insiders torch Nick Viall... but save the ugliest whispers for his much-younger wife The tragic truth about what happened to the quintuplets who won America's hearts America's top 10 most generous billionaires are revealed Ritzy LA neighborhood where yoga moms are scandalized by den of iniquity... then go home to letters about what their brazen husbands are up to Ex-UFC star Cain Velasquez reunites with family as he's released after being jailed over attempted murder of man accused of sexually assaulting his son Meet'Donor Dan' who promises dads an international life of luxury... but only if they meet his incredibly high standards Disney's staggering nine-figure loss laid bare after its woke Snow White movie with Rachel Zegler flopped spectacularly Terrifying moment'drunk driver' is brought to his knees in high-speed chase with cops Meghan Markle cozies up to Prince Harry as they enjoy courtside date night at NBA All-Star Game over Valentine's Day weekend Harrison Ford, 83, and Calista Flockhart, 61, kiss on the tarmac as they arrive back in LA on Valentine's Day How the daughter of a Real Housewife laid bare an ugly truth at the heart of Hollywood... and exposed the depth of the damage it's done I'm a celebrity security guard here's where I'd never let my teenagers go for spring break David Harbour skips Stranger Things costar Maya Hawke's wedding as he goes on Valentine's date with mystery woman My wife thinks her surprise to me is every man's dream... but I'm disgusted by what she's offering to do: DEAR JANE I exposed the only US city erased from Google Maps... here's what this ultra-wealthy community doesn't want you to see Only 20% of people can solve this three-question IQ test backed by MIT... are you one of them? The world's shortest IQ test is just three questions long and can tell if you're smarter than 80 percent of the population. Called the Cognitive Reflection Test ( CRT), it has been around since 2005 but recently gained popularity on social media, with one TikTok user's breakdown of the three questions getting 14million views. The test was created by psychologist Shane Frederick, now at the Yale School of Management, to help predict whether people are likely to make common mistakes in thinking and decision-making. Since its creation, multiple studies over the last two decades have tested thousands of college students, finding that less than 20 percent can get all three right.


A Data Analysis The LoRA Dataset Project page:https: //lora-vqa.github.io/

Neural Information Processing Systems

Each question and answer group has a unique list of corresponding visuals used for image creation. The list of visible objects, which combines the correct-answer objects with an arbitrary'noise' object


All major AI models risk encouraging dangerous science experiments

New Scientist

Researchers risk fire, explosion or poisoning by allowing AI to design experiments, warn scientists. The use of AI models in scientific laboratories risks enabling dangerous experiments that could cause fires or explosions, researchers have warned. Such models offer a convincing illusion of understanding but are susceptible to missing basic and vital safety precautions. In tests of 19 cutting-edge AI models, every single one made potentially deadly mistakes. Serious accidents in university labs are rare but certainly not unheard of.


Training Chain-of-Thought via Latent-Variable Inference

Neural Information Processing Systems

Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a chain-of-thought (CoT) prompt. One can also improve LLMs' performance on a specific task by supervised fine-tuning, i.e., by using gradient ascent on some tunable parameters to maximize the average log-likelihood of correct answers from a labeled training set. Naively combining CoT with supervised tuning requires supervision not just of the correct answers, but also of detailed rationales that lead to those answers; these rationales are expensive to produce by hand. Instead, we propose a fine-tuning strategy that tries to maximize the \emph{marginal} log-likelihood of generating a correct answer using CoT prompting, approximately averaging over all possible rationales. The core challenge is sampling from the posterior over rationales conditioned on the correct answer; we address it using a simple Markov-chain Monte Carlo (MCMC) expectation-maximization (EM) algorithm inspired by the self-taught reasoner (STaR), memoized wake-sleep, Markovian score climbing, and persistent contrastive divergence. This algorithm also admits a novel control-variate technique that drives the variance of our gradient estimates to zero as the model improves. Applying our technique to GSM8K and the tasks in BIG-Bench Hard, we find that this MCMC-EM fine-tuning technique typically improves the model's accuracy on held-out examples more than STaR or prompt-tuning with or without CoT.


ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

Neural Information Processing Systems

Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning). In this paper, we develop a reinforced self-training approach, called ReST-MCTS*, based on integrating process reward guidance with tree search MCTS* for collecting higher-quality reasoning traces as well as per-step value to train policy and reward models. ReST-MCTS* circumvents the per-step manual annotation typically used to train process rewards by tree-search-based reinforcement learning: Given oracle final correct answers, ReST-MCTS* is able to infer the correct process rewards by estimating the probability this step can help lead to the correct answer. These inferred rewards serve dual purposes: they act as value targets for further refining the process reward model and also facilitate the selection of high-quality traces for policy model self-training. We first show that the tree-search policy in ReST-MCTS* achieves higher accuracy compared with prior LLM reasoning baselines such as Best-of-N and Tree-of-Thought, within the same search budget. We then show that by using traces searched by this tree-search policy as training data, we can continuously enhance the three language models for multiple iterations, and outperform other self-training algorithms such as ReST$^\text{EM}$ and Self-Rewarding LM.


STaR: Bootstrapping Reasoning With Reasoning

Neural Information Processing Systems

Generating step-by-step chain-of-thought rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the Self-Taught Reasoner (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30$\times$ larger state-of-the-art language model on CommensenseQA. Thus, STaR lets a model improve itself by learning from its own generated reasoning.


Learning to Reason in LLMs by Expectation Maximization

Lee, Junghyun, Kveton, Branislav, Choudhary, Sunav, Mukherjee, Subhojyoti, Rao, Anup, Rossi, Ryan A., Siu, Alexa

arXiv.org Machine Learning

Large language models (LLMs) solve reasoning problems by first generating a rationale and then answering. We formalize reasoning as a latent variable model and derive an expectation-maximization (EM) objective for learning to reason. This view connects EM and modern reward-based optimization, and shows that the main challenge lies in designing a sampling distribution that generates rationales that justify correct answers. We instantiate and compare several sampling schemes: rejection sampling with a budget, self-taught reasoner (STaR), and prompt posterior sampling (PPS), which only keeps the rationalization stage of STaR. Our experiments on the ARC, MMLU, and OpenBookQA datasets with the Llama and Qwen models show that the sampling scheme can significantly affect the accuracy of learned reasoning models. Despite its simplicity, we observe that PPS outperforms the other sampling schemes.


Noise-Robust Abstractive Compression in Retrieval-Augmented Language Models

Kim, Singon

arXiv.org Artificial Intelligence

However, retrieved documents often include information that is either irrelevant to answering the query or misleading due to factual incorrect content, despite having high relevance scores. This behavior indicates that abstractive compressors are more likely to omit important information essential for the correct answer, especially in long contexts where attention dispersion occurs. To address this issue, we categorize retrieved documents in a more fine-grained manner and propose Abstractive Compression Robust against Noise (ACoRN), which introduces two novel training steps. First, we use offline data augmentation on the training dataset to enhance compressor robustness against two distinct types of retrieval noise. Second, since the language model based compressor cannot fully utilize information from multiple retrieved documents and exhibits positional bias, we perform finetuning to generate summaries centered around key information that directly supports the correct answer. Our experiments demonstrate that T5-large, trained with ACoRN as a compressor, improves EM and F1 scores while preserving the answer string, which could serve as direct evidence.